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Abstract 

In this paper we study the benefit of entanglement in settings involving classical in- 
puts, outputs, and communication channels from an information theoretic perspective. It 
is known that although (asymptotic) zero-error capacity of (point-to-point) classical chan- 
nels may increase when the sender and receiver are provided with shared entanglement, 
permitting an asymptotically vanishing error eliminates this benefit. In contrast we show 
that in the correlation simulation problem, entanglement is strictly beneficial even with an 
asymptotically vanishing error requirement. To accomplish this we extend a special case of 
the recent result of Yassaee et al. to the entanglement-assisted setting. Further we argue 
that studying the benefit of entanglement in multi-terminal settings requires evaluation of 
expressions involving quantum auxiliary registers. This would require bounds on the di- 
mension of the auxiliary quantum registers in a given expression. However no non-trivial 
technique for bounding the dimension of auxiliary quantum registers is known. To approach 
this problem we define the problem of quantum convexification. We show that quantum 
convexification is strictly stronger than the usual classical convexification. To prove this 
fact we develop new tools which might be useful for bounding the dimension of quantum 
registers in optimization problems involving an auxiliary quantum system. 

1 Introduction 

Entanglement is one of the most striking features of quantum mechanics [T]. Bell's theorem [5] 
states that local measurements of entangled states may result in correlations which cannot be 
realized in local hidden variable models. Besides Bell's inequalities, there have been attempts to 
quantify the strength of non-local correlations by studying how much communication between 
two distant parties is necessary to generate the correlations in a classical setting where the parties 
are provided with preshared randomness (see for example [31 HI El El 13 IH1 IH1 EH [HI US])- I n 
particular to obtain a robust measure the average amount of communication needed to generate 
non-local correlations has been computed for certain types of correlations. These computations 
however, have a combinatorial nature; it seems hard to generalize them to arbitrary correlations. 
Nevertheless, the combinatorial structures melt away in an information theoretic formulation 
that allows for an asymptotically vanishing amount of error. So we suggest to quantify the 
amount of communication required to simulate non-locality in an information theoretic setting. 
This problem indeed has been recently solved in its most general form by Yassaee et al. [P3] 
where the parties are provided with infinite shared randomness. The case of infinite shared 
entanglement has not been considered in [15] . 

In this paper we are interested in the benefit of entanglement in classical communication 
settings from an information theoretic perspective. By classical we mean classical inputs, out- 
puts, and communication channels, and by information theoretic we mean a framework involving 
average quantities over repeated trials permitting an asymptotically vanishing error. An infor- 
mation theoretic perspective not only allows for "melting" of the combinatorial structures, but 
can also be a deciding factor on whether entanglement can be beneficial at all. For example, 
(asymptotic) zero-error capacity of (point-to-point) classical channels may increase when the 



sender and receiver are provided with shared entanglement P3] [T5] . Nevertheless, the (usual) 
capacity of classical channels does not change in the presence of entanglement. 

To study the benefit of entanglement in classical scenarios, we start by the problem of sim- 
ulating bipartite correlations via communication in the presence of shared entanglement. That 
is, how much communication is required to simulate a given bipartite correlation when the two 
parties are provided with (infinite) shared entanglement. While the overall task of generating 
correlations is classical in nature, shared entanglement is known to help in a non-information 
theoretic setup: for the case of no communication, Bell's theorem states that there are non-local 
correlations that can be generated in the presence of entanglement. But how about an informa- 
tion theoretic setting allowing for asymptotically vanishing error? We show that entanglement 
can still help by extending the result of []3] (see Theorem [2] below) . This demonstrates an odd 
difference between the problems of channel capacity and simulation of non-local correlations; it 
helps in both cases in a non-information theoretic setup, but only helps the latter in an informa- 
tion theoretic setup. To the best knowledge of the authors, simulation of bipartite correlations is 
the first classical scenario in which shared entanglement helps in an information theoretic sense. 

Given the odd difference between channel capacity and correlation simulation, we look for 
other classical information theoretic settings where shared entanglement helps. In this quest one 
has to look beyond point-to-point channels to multi-terminal problems. One candidate is the 
Gray-Wyner problem whose goal is to transmit multiple correlated sources to multiple distant 
parties. Recently Winter (personal communication, 2012) has found the capacity region of the 
entanglement-assisted Gray-Wyner problem. We discuss this region in Appendix [B] but roughly 
speaking the region involves a union over auxiliary quantum registers. If we assume that the 
auxiliary subsystems are all classical random variables, we obtain the classical capacity region 
of the Gray-Wyner problem [16]. Thus the problem of whether shared entanglement helps or 
not reduces to the problem of whether auxiliary quantum registers can be replaced by classical 
random variables. 

Evaluating an expression involving auxiliary quantum registers is difficult in the Gray-Wyner 
problem, and in general. To be concrete, let us start with a simple example: given two random 
variables X and Y, consider the region formed by pairs (H(X\F), H(Y\F)) when we take the 
union over all auxiliary quantum registers F. Evaluating this region by numerical brute-force 
simulation requires bounds on the dimension of the Hilbert space of F. It is fair to say that no 
non-trivial technique for bounding the dimension of auxiliary quantum registers is known, even 
for this simple example. Therefore much of the expressions showing up in the literature are not 
computable even when presented in the single-letter formj^ In this paper we develop new tools 
for comparing classical and quantum expressions. We are optimistic that our tools could also 



prove useful in bounding the dimension of auxiliary registers (see section 4.4 ) . 

The main difficulty in bounding the dimension of the auxiliary quantum registers is that our 
few tools from classical information theory are not readily applicable to quantum settings. The 
main tool in the classical world is the Caratheodory theorem, but also the perturbation method 
[17], Appendix C] , [18] and some manipulation techniques as in |T9] . The Caratheodory theorem 
heavily relies on the fact that for any two random variables X and Y, the conditional entropy 
H(X\Y) is the linear combination of ^2 p(y)H(X\Y = y); classical conditioning is a simple 
convexification. But this intuition fails when we condition a random variable A on a quantum 
register F. 

In the second part of this paper we study quantum conditioning or "quantum convexification." 
We show that quantum convexification is strictly richer than classical convexification. More 
precisely we show that there exist random variables Xi , . . . , X n and auxiliary quantum register 
F such that for every auxiliary random variable C we have 

{H{X 1 \C), H(X n \C)) + (if(Xi|F), . . . , H(X n \F)) . 

We also develop new tools to study optimization problems involving quantum registers that 
might be of use in other such optimization problems. Concretely speaking, for any arbitrary 
q(x,y,z), we consider the optimization problem 

sup I(F;Y)-I(F;Z), 

F-X-YZ 



1 An expression is computable if for every e > 0, there is an algorithm that stops in finite time T £ and outputs 
the value of the expression within e. A finite dimensional characterization implies computability of an expression. 
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over all quantum registers F, where F — X — Y Z represents the Markov chain condition. We 
show that there exists a distribution q(x, y, z) such that the supremum over auxiliary quantum 
registers F yields a larger value than taking the maximum of the same expression over classical 
auxiliary random variables. In other words 

sup Z(F; Y) - I(F; Z) > max I(C; Y) - I(C; Z). 



-Y Z 
F quant 
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The alphabet size of the random variable X in the above example is large. However when X is 
binary and the dimension of F is two, we show that for any channel q(y, z\x) we have 

sup J(F; Y) - J(F; Z) = max 7(C; Y) - I(C; Z). 

F-X-YZ O— A — rZ 

dim F = 2 

Our last contribution is to illustrate the possibility of an entirely different approach for 
proving computable outer bounds, when proving dimension bounds on the size of auxiliary 
registers is difficult. Our Theorem [2] on entanglement-assisted correlation simulation suffers 
from lack of dimension bounds. However in section f3. 5 1 we show that we can use "Information 
Causality" of |20j indirectly to prove a computable outer bound for CHSH type correlations. 
Although it is not clear how to extend the result to non-CHSH-type correlations, we would like 
to highlight the possibility of getting around dimension bounds if one is only interested in outer 
bounds. 

This paper is organized as follows: in Section [2] we set up our notation and remind some pre- 
liminaries. In Section [3] we discuss the channel simulation problem, first the shared randomness 
and then the shared entanglement cases. Subsection |3.5| contains our CHSH-type correlation 
example where we compare the shared randomness and shared entanglement cases. Section [4] 
sets up a framework for discussing quantum conditioning (or "quantum convexification" ) . It is 
followed by an example that "quantum convexification" is strictly richer than classical convex- 
ification. Section |4.4| discusses the potential use of our technique to bound the dimension of 
auxiliary quantum registers. 



2 Preliminaries 

Classical random variables are denoted by capital letters A,B,X,Y. The set of outcomes of 
X is denoted by X, and by size of X we mean \X\, size of the set X. By X n = X\ . . .X n 
we mean n i.i.d. copies of X, and X^k (for k > I) means X(Xi + i . . . Xk- Outcomes of X n 
are denoted by X — - X i , . , X ji • so the outcome of the i-th random variable in X n is Xi G X. 
The sequence x n = x\ . . . x n happens with probability p{x n ) = p(xi) ■ ■ ■ p{x n ). To distinguish 
quantum registers from classical random variables we denote them by boldface letters E, F, and 
the dimension of the corresponding Hilbert space to F is denoted by dimF. Again F™ = Fi . . . F„ 
denotes n i.i.d. copies of F, and p® n denotes n i.i.d. copies of the density matrix p. 

We fix an orthonormal basis . . . , |i>dimF)} for the Hilbert space of F and write all the 

transposes (T) with respect to this basis. Moreover, the state 

^ dim F 

ydimF ^— \ 

is the maximally entangled state over EF where E is a copy of F. 

H(-) denotes the entropy function (either Shannon or von Neumann entropy), and /(• ; •) is the 
mutual information. For random variables X, Y, Z by X — Y — Z we mean that I(X; Z\Y) = 0. 
We use the same notation if either of X, Y or Z is a quantum register. When Y is classical 
X — Y — Z cquivalently means that X can be generated out of Y using a channel independent 
of Z. When Y is quantum however, by applying a measurement on Y to generate X we destroy 
Y. So X, Y do not simultaneously exist and in this case I(X; Z\Y) has no meaning. As a result 
we save the notation X — Y — Z when all X, Y, Z simultaneously exist and I(X; Z\Y) = 0. 

Lemma 1 Suppose X — F — Y where X, Y are classical random variables and F is a quan- 
tum register which for every y is purified by E. Then having access to E one can generate X 
independent ofY. 
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Proof: Due to the structure of tripartite states with X — F — Y there exists a decomposition of 



the Hilbert space of F of the form (J) ■ F 3 L <g> F 3 R such that the state of X, Y, F can be written as 



090') \J2Po( x )\ x )( x \^f^) ® (&i(i/)lw><i'l® ff 



where /Q^ and a 3 y are arbitrary states of F J L and F^ respectively. Thus for every y the state of F 
is equal to 

a: j 

To find a purification of this state let \4'i)-E j F j anc ^ I^Pe^ F j be purifications of p> x and <r^ 
respectively. Then 



is a purification of ([I]) where the register which purifies F is E = JE' ^(Bj E^E^ j . Note that 

all purifications of ([I]) are equivalent to the above purification up to a unitary, and E contains 
X as a subsystem. We are done. □ 
For either probability distributions or quantum states the norm-one distance is denoted by 
|| • ||i. We will also frequently use the gentle measurement lemma. 

Lemma 2 (Gentle measurement lemma) Let p be a quantum state and {Mo, Mi} be a binary 
measurement such that tr^M^Mop) > 1 — e. Then after measuring p with {Mo, Mi} and obtaining 
as the result, the state collapses to p' such that 

\\p-p'\\i < 2y/l. 

We assume the reader is familiar with the notions of typicality and conditional typicality. 
Here we only fix some notations. The state of a classical-quantum system XF has the form 
^2 x p(x)\x){x\ ® p x , and subsystem F has the average state p — ^2 x p{x)p x . The 5-typical 
subspace of p is determined by a projection IP $ acting on the Hilbert space of F™. We may 
drop the index p in n™ s when there is no confusion. 

For every 5, e > and sufficiently large n we have 

tr (n?p®") > 1 - e, 
(1 - e )2"( ff ( F )- c5 ) < dim II? < (1 - e ) 2 "W F )+ c5 ), 

and 

2~ n(ir(F)+o5)jjn <- jjn p® n TYg < 2~ n (-^( F ) — c <5)rj™ 

where c is some constant. 

For a given x n we define p x n — p Xl <g> • • • p Xn . Moreover, by H x s we mean the conditional 
(5-typical projection. Again we may denote Il x s by U x when there is no confusion. For every 
strongly (5-typical x n we have 

tr(nf PsB ») >l-e, 

(1 - t )2^m^X)-c8") < dim n x" < (1 _ e)2 n(H {F \X)+cS") } 

and 

2 -n{H(F\X)+5") U n < n |" ^jjn < 2 -»(ff(F|X)-5") n f , 

where 5" = £'|A?| log (dim F) + C< 5 + |A?|c<5<5'. We further have 

tr (nf p x n) > 1 - e. 



Lemma 3 Suppose F — X — Y and let x n y n be jointly typical. Then for sufficiently large n we 
have 

tr(llfp xn ) >l-e. 



4 



Proof: Assume \y\ = k and 




ti e 2 iz i k 



where Yli=x ^ = n - Since y n is typical 1^ — p(Y = i)\ < 5 which implies that £f is sufficiently 
large when n is (and p(Y = i) > 0). 

By definition H v s = II* 1 s S3 • • • ® II** 5 where 11** 5 is the typical projection with respect to 
a i = J2 X 'P( X = x'\Y = i)p x i. 

With abuse of notation we may write 

px" = p Xl ® Px 2 <& ' ' ' ® Ar„ = ft^l ® /V2 ® • • • ® Px^fe ■ 

x £i is 5-typical with respect to p(X — x\Y — i) since x n y n is jointly typical. Thus for sufficiently 
large £j, we have 

As a result, 

tr (n?V) = f[tr (11^.,/', ) > (l - ^)"' > 1 " e. 

2 — 1 

□ 



3 Simulation of Correlations 

The problem of simulation of correlations via one-way communication is defined as follows. Alice 
and Bob observe i.i.d. repetitions of two random variables X and Y, and would like to generate 
i.i.d. repetitions of random variables A and B respectively. Random variables X, Y, A, B are 
jointly distributed according to a given p(a,b,x,y) — p(x,y)p(a,b\x,y). There is a one-way 
communication link from Alice to Bob at rate R. The question is for which values of R the 
channel with input {X, Y) and output (A, B) can be simulated. 



3.1 Classical case (infinite shared randomness) 

Assume that Alice and Bob observe i.i.d. copies of X n and Y n respectively jointly distributed 
according to n™=i P( x ii Hi)- They also share common randomness c at some arbitrary rate. 
An (n, e, R) code consists of a randomized encoder p(m\x n c) and two randomized decoders 
p(a n \mx n c), p(b n \my n c) such that ±H(M) < R and 



p(a n ,b n ,x n ,y n ) - Y[p(ai,bi,Xi,yi 



< e. 



(2) 



A rate R is said to be achievable if there exists a sequence of (n, e n , R) codes such that 

0. 



lim e Tl 

n— >oo 



The set of achievable rates is denoted by 1Z C . 

Yassaee et al [T3] solve a generalized version of this problem. Their rate region (Theorem 1 
of [13]) reduces to the following region as its special case. 

Theorem 1 

1Z C = {R : 3F satisfying the following conditions & R > I(X; F\Y)}. 

Here F is some auxiliary random variable with joint distribution p(f, a, b, x, y) = p{a, 6, x, y)p(f\a, b, x, y) 
satisfying 

F-X-Y, 
A- FX - YB, 
B -FY - XA, 

|Jl<|*|W|.4||g| + l. (3) 

In other words, R is achievable if and only if R > min I(X; F\Y) where the minimum is taken 
over auxiliary random variables F satisfying the conditions given by ([3|. 
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3.2 Quantum case (infinite shared entanglement) 



The setup here is similar to the classical case except that instead of shared randomness, Alice 
and Bob are provided with infinite supply of shared entanglement. To simulate the correlation, 
Alice applies a measurement chosen according to the observed x n , on her part of the shared 
entanglement. The measurement outcome has two parts: the first part is taken as a™ and 
the second part is taken as the message m to be transmitted to Bob. Bob uses m and his 
observation y n to choose a measurement to be applied on his quantum system. The outcome of 
this measurement is taken as b n . 

A rate R is achievable if there exists a sequence of codes (n,e n ,R) as above such that 
ijJ(M) < R and the total variation distance between the induced distribution p{a n , b n , x n , y n ) 
by the code and the original distribution Yl^ =1 p(ai,bi,Xi,yi) is at most e„, i.e., Q holds. The 
set of achievable rates is denoted by lZ q . 

For e > define 

S e ={R : 3F satisfying the following conditions & R > I(X; F\Y)} 
A,X,Y,F ~ p(a,x,y)\p* xy 
A B,X,Y ~ p(a, b, x, y) = p(a, x, y)p{b\a, x, y), 
\\p(a,b,x,y) -p(a,b,x,y)\\ 1 < e, 
F-X-Y, 
A — FX — Y, 

3* s.t. #(F,Y) = (B,Y). (4) 

By the first two constraints we mean that A,B,X,Y is distributed according to p(a,b, x,y) = 
p(a, x, y)p{b\a, x, y), and that A, X, Y, F is a classical-quantum (c-q) state where the distribution 
over the classical part is p(a,x,y). Thus B and F do not necessarily exist simultaneously. 
The last constraint means that there exists a measurement on F chosen according to Y which 
generates B. 

Theorem 2 S C TZ q C f| e >o^- 

Note that if similar to the classical case, we could prove an upper bound on the dimension of 
the register F in the definition of S e (independent of e), we could conclude that Sq = f] e>0 <S £ = 
K q . 

The rest of this section is devoted to the proof of this theorem. 



3.3 Proof of TZ q C f\ >0 S e 

For every e > we show that TZ q C S e . Let R g lZ q . Then by definition there exists an (n, e, R) 
code for a sufficiently large n such that H(M) = nR. Let Q denote Bob's quantum part of the 
shared state after Alice's measurement. We have 

nR = H{M) > I(X n ; M\Y n ) 

= I(X n ;MQ\Y n ) (5) 

n 

= MQ\Xi. i _\Y i Yx-i-\Y i+1 .. n ) 

i=l 
n 

= J2 I (X i ]MQX 1 .. i _ 1 Y Ui _ 1 Y i+1 .. n \Y i ), (6) 

i=l 

where ([5| follows from the no-signaling principle and Q follows from X n ,Y n being i.i.d.. Let 
U be a random variable uniformly distributed over {1, 2, ■ ■ • , n} and independent of all previously 
defined registers. Let X — X v , Y — Y v , A — Au, B — B v , F = {M,Q, X 1:U _ 1 ,Y 1 .. u ^ 1 Y u+1:n ,U). 
Thus using equation ^ we can write 

1 - 

R>~Y I(X U ; MQX 1 . A _ 1 Y 1:i _ 1 Y i+1:n \Y i ) = I(X; F\Y). 

i=l 
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Therefore if we show that A, B, X, Y, F satisfy the conditions given by Q, we are done. 

By definition of X — Xy, Y = Yjj, A — Ajj, B = Bjj, the probability distribution over 
A,B,X,Y is p(a,b,x,y) = ^ £]™=iP( a i A, ^i, Therefore, 



\\p(a,b,x,y) -p(a,b,x,y)\\ 1 = 



1 

- ^p{ai,bi,Xi,yi) - q(a 7 b,x,y) 
n f-f 
i—i 

1 - 

- ~ \\p{ai,bi,Xi,yi) - q(a,b,x,y)\\ 1 

i=l 

n 

p(a n ,b n ,x n , y n ) - Y[ q(a l ,b l ,x l ,y l ) 



< 



< e. 



i=i 



Next to show that F — X — Y, note that 

J(F; Y\X) = I(MQX 1:U _ 1 Y 1:U _ 1 Y u+1:n U; Y V \X V ) 
1 - 

= - V I(MQX 1:l _ 1 Y 1:l _ 1 Y l+1:n ; Y\X t ) 
n ' 
<=l 

1 " 

< — / . ^(^Q^X:t-l^i+l:n^l:t-l^i+l:nj ^ PCi) 

n * — ' 

i=l 
1 ™ 

= — y I(MQ; Yi XjJCi : j_iX,-_|_X:n^l:i-l^i+l:n) 

n ' 

i=l 

< - V/(AfQ ; r n |x™) 

i=l 

= I{MQ;Y n \X n ) 

= 0, 

where the last step follows from the non-signaling principle. 
Next to show that A - FX - Y, note that 

I(A;Y\XF) = I(AF;Y\X) 

= I{A u MQ,X 1:U _ 1 Y 1:U _ 1 Y u+1:n U;Y u \X u ) 
1 n 

= - V /(AiMQXi : i_iy 1:i _iy i+ i : „; r.ix,) 

n * — ' 

i=i 

1 n 

< — / ^(^ n ^Q^l:i-l^i+l:nix:i-lii+l:ni ^PQ) 

i=l 

1 " 

= - Y,I(A n MQ;Y i \X i Y 1:i _ 1 Y i+1 .. n X 1 . A _ 1 X i+1:n ) 

i=i 
n 

< - I{A n MQ; Y n \X n ) 

71 i=l 

= I(A n MQ;Y n \X n ) 
= 0, 

where again the last step follows from the non-signaling principle. 

Lastly to show that \1/(F,Y) = {B,Y) for some measurement *S> on F,Y, note that F,Y 
includes M,Q,Yi : u-iYu+i: n ,U,Yu meaning that it contains M,Q,Y n ,U. Thus, we can use 
the measurement on M, Q,Y n in the code which gives B n , and depending on the value of U 
construct B = Bjj out of F, Y. 
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3.4 Proof of S C TZ q 

To prove the achievability it would help to start with a simpler problem, namely remote state 
preparation with quantum side information. The setup of this problem is as follows. Let X, Y 
be two random variables with joint distribution p(x,y), and let F be a quantum register such 
that 

F — X — Y. 

Alice and Bob receive x n and y n respectively, i.e., n i.i.d copies of X,Y, and their goal is to 
prepare F n at Bob's side. The question is how much classical communication from Alice to Bob 
is required if they are provided with infinite shared entanglement. 

Theorem 3 (Remote state preparation with quantum side information) The minimum rate of 
one-way (classical) communication for remote state preparation with classical side information 
and infinite shared entanglement is I(X;F\Y). 

So Cj lZ q is a simple consequence of this theorem. Let F be a quantum register satisfying con- 
ditions Q for e = 0. By the above theorem Alice can prepare an approximation of F" at Bob's 
side with almost nI(X;F\Y) bits of one-way communication. In the remote state preparation 
protocol Alice has an approximate purification of F™ in hand (see the details of the proof below). 
Thus using A — FX — Y and based on Lemma [l] she can generate an approximation of A n . On 
the other hand, since by Q there is a measurement on (F, Y) which gives B, Bob can gener- 
ate an approximation of B n after receiving Alice's message. So we only need to prove Theorem[3j 

Proof: We start by showing that at least I(X;F\Y) bits of communication per copy is re- 
quired. Suppose that for every e > and sufficiently large n there is a protocol with nR bits of 
communication in which Bob can prepare F™ such that the trace distance between the state of 
(A",r™,F n ) and (X n ,Y n ,F n ) is at most e. Then by Fannes inequality we have 

I(X n ;F n \Y n ) -nelogd-?j(e)/ln2 < I(X n ; F n \Y n ), (7) 

where d — (dimF)|A"| • and 77(e) = — elne. Let M be the message from Alice to Bob and 
Q be Bob's part of the shared entanglement after receiving M. Then by the date processing 
inequality we have 

I(X n :F n \Y n ) < I(X n ;MQ\Y n ) 

= I(X n ; Q\Y n ) + I(X n ; M\Y n Q) 
= I(X n ;M\Y n Q) 

< H(M) 

< nR, 

where in the third line we use the no-signaling principle. Combining the above inequality with ^ 
gives the desired result. 

We now discus the achievability protocol. For a sufficiently large n Alice and Bob share 

2 n(I(F;X)+6"+c8+a) CQpies Q f |^ B , p/ = / g ^C~n |$) B , p/) where 

Ts -dimTiy 115 ' 

and |$)e'F' is the maximally entangled state (Alice holds E' and Bob holds F'). They put these 
copies in groups of size 2"( / ( F;r )~' 5 - c5 - Q ). Thus the number of groups is equal to 

On(I(V:X)+S" +cS+a) 

f _ r > n(I(F;X)-I(F;Y)+2S"+2cS+2a) 

2n(I(F;Y)-5"-cS-a) 

Alice and Bob respectively receive x n and y n . With probability at least 1 — e, x n y n is jointly 
typical. Alice measures her side of \if>) for all copies using the measurement {Qf , \fl~— (Q5™) 2 } 
where ^ 

Qf = ^ 2 n(H(F\X)~S») (juf Px nU2 

The following lemma is proved in Appendix [A] 
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Lemma 4 If we measure |'0)e'F' by the measurement {Q$ , y/T— (Q|") 2 } acting on subsystem 
E', then Q$ is obtained with probability at least 

2 -n(I(X;F)+6"+c6)( 1 _ 4^ 

and in this case F' collapses to some p" n where 

life- -p£» Hi <6^i. 

^4s a result, if this measurement is applied on 2"( / (- F;X )+ t5 + C<5 + Q ) copies of |^)e'F'j with proba- 
bility at least 1 — e~ 2 ( 1_4 v^) one 0/ £/ie outcomes is Q$ . 

This lemma states that with probability at least 1 — e - ^ -4 "^ 2 " there exists an index i 
(if there are more than one pick one of them randomly) such that the outcome of the i-th 
measurement is Q x 8 . Then Bob's side of the «-th copy of shared states collapses to p" n where 
||p x n — p^n ||i < 6ei. Alice sends Bob the index of the group to which i belongs. She needs 
n(I(F; X) — Z(F; Y) + 25" + 2c5 + 2a) bits of communication to send this index. 

Now Bob applies the measurement {n^ , I — lie } on all subsystems in the group to which 
i belongs. These measurements are indeed measurements on = tow (\ip) (ip\wF') • 

Lemma 5 If we apply the measurement {11^ , I — II| } on 2™( / (- F ; r )~' 5 - c5 - Q ) copies ofr^, the 
probability of obtaining more than one is at most 2 ^i_^ ■ 

The above lemma is proved in Appendix [A"| Thus among Bob's measurements with high 
probability there is at most one outcome IT^ . On the other hand for the i-th subsystem we 
have 

tr (nfp£„) > tr (Ylf P:c n) - 12ei > 1 - e - 12e*, 

where here we use Lemma [3] Therefore, this measurement helps Bob to distinguish the index 
i. In fact by the gentle measurement lemma with high probability the measurement on the i-th 
subsystem results in the state p'" n such that 

Up*™ - /i"lli < Hp," - p^lk + IIPx« - p*»IIi < ^ + 2\Je + ml. 

Probability of error of the protocol is less than or equal to 

n-an+l 

e- 2 "^ 1 - 4 ^) + i +e + 12eK 

1 — e 

and the number of communicated bits is equal to 

n(J(F; X) - I(F; Y) + 25" + 2c5 + 2a) = n(I(X; F\Y) + 25" + 2c5 + 2a). 

□ 

3.5 Example: the CHSH-type correlations 

Consider the problem of simulating the following correlation for a given e > 0. Let x, y as well 
as a,b be binary, and p(a,b\x,y) be equal to when a © b = xy holds. This correlation 
corresponds to a wining strategy for the CHSH game |2li with probability p = Here we 

consider the uniform distribution on inputs (p(x,y) — p(x)p(y) = j), and study the problem of 
simulation of this correlation in both classical and quantum settings. 

Should we allow for infinite preshared randomness, the communication cost would be given 
by expression discussed earlier in Theorem [I] i.e. the minimum of I(X;U\Y) over all classical 
random variables U determined by p(u\x,y,a,b) such that the joint distribution p(u,a,b,x,y) 
factorizes as p(u, a, 6, x, y) = p(x, y)p(u\x)p(a\u 1 x)p(b\u, y). 

Independence of X and Y implies that I(X; U\Y) = I(X\ U). Moreover, U can be taken to 
be a binary random variable using the Fenchel extension of the Caratheodory theorem. Then 
computing the optimal rate for every e is a straightforward optimization problem. The blue plot 
of Fig. [T] gives the one-way communication cost of winning the CHSH game with probability 
P=^. 
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Figure 1: (Blue curve) The one-way communication cost of simulating the CHSH-type correlation 
with bias e assuming preshared randomness. The horizontal axes corresponds to parameter 
p = (0.75 < p < 1). (Red curve) A lower bound on the entanglement-assisted one-way 
communication cost of simulating the CHSH-type correlation with bias e = 2p — 1. This lower 
bound is an implication of Information Causality. 



From this plot we observe that at p = | + we get a positive rate while in the presence of 
entanglement p = | + can be achieve with no communication. This means that, unlike the 
problem of point-to-point channel capacity, entanglement does help in the problem of simulation 
of correlations. 

Should we allow for infinite shared entanglement, we have lower and upper bounds on the 
communication cost by Theorem [2j Unfortunately both the lower and the upper bounds are 
non-computable since we have no bound on the dimension of the auxiliary register F. Therefore, 
to find a computable bound we use an ad hoc technique based on the recently proposed principle 
of Information Causality [2D] . 

Information Causality is based on the following communication scenario. Alice receives a 
binary string a\ . . . a/v chosen according to the uniform distribution, and Bob receives a random 
be {1, . . . ,N}. Alice sends a message m to Bob whose goal is to find a^. Letting gt, be Bob's 
guess, Information Causality states that 

N 

H{M)>Y,I(A i ;G i \B = i). (8) 

i=l 

In fact the above inequality holds in any physical theory, including the quantum theory, that 
admits a mutual information satisfying certain natural properties. 

It is shown in [2U] that Alice and Bob by sharing k = 2 n — 1 no-singling boxes with CHSH- 
type correlations with bias e and sending only one bit from Alice to Bob, can play the above 
game for N = 2™ in such a way that the right hand side of ([8| be equal to 2" (l — h(^ 1 )), 
where h(-) denotes the binary entropy function. We now would like to simulate this scheme by 
two new parties, say Alice' and Bob', who instead of non-local boxes, have shared entanglement 
as their resources at the outset. 

Let R q be the entanglement-assisted communication cost of simulating the non-local box 
with bias e. Alice' and Bob' can simulate the scheme of Alice and Bob by first sending kR q 
bits from Alice' to Bob' to simulate the k boxes, and then one bit to simulate the message that 
was passed from Alice to Bob. This enables Bob' to faithfully simulate <?j. Now since Alice' 
and Bob' play the game in a quantum world for which Information Causality holds, we may use 
inequality The right hand side of ^ is equal to 2™ (l — h(^ 1 )) and the left hand side, 
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namely the number of communicated bit is kR q + H(C). Therefore, 



(2" - l)R q + 1 > 2" [1 

which implies 

R" > l-h' 



2" - 1 V V 2 ) ) 2 n - 1 

Computing this lower bound for all n and taking the optimal one for every e, we obtain the red 
plot of Fig. [TJ We see that the lower bound is equal to one at p = 1, thus it has to be tight at 
this point. By [2D], the above lower bound (for n converging to infinity) would also be tight at 
the other end point p < \ + -^k^. However, it may be loose in between because firstly we have 
considered the specific scheme of 20J for using boxes, and secondly this lower bound holds more 
generally for any physical theory satisfying properties of mutual information given in |20j and 
not only for quantum physics. Nonetheless, we would like to highlight that the lower bound at 
p = 1 is tight in any such physical theory, as shown in the figure. 



4 Quantum Conditioning 

A classical communication problem whose entanglement-assisted rate region is not completely 
understood is the Gray-Wyner problem [TB] (for the definition of this classical communication 
problem see Appendix |Bj). In Appendix [5] we observe that for a given distribution p(x\ . . . x n ) 
the rate region, when infinite shared entanglement is available, includes the set of tuples 

(h{X x ...X n )- H(X 1 . . . X n \F), JTpfilF), . . . , #(X„|F)) 

for all quantum registers F. Replacing F by a classical random variable gives the classical rate 
region. This observation suggests that to study the role of shared entanglement in classical 
communication settings, one needs to understand the meaning of conditional entropy given a 
quantum register (H( • |F)). 

In the classical world, conditioning on a random variable has several meanings one of which 
is convexification. To isolate this interpretation of conditioning we begin by some notations. 
Fix finite sets X%, . . . , X ni and consider the mapping p(x\ . . . x n ) n- (i/(Ai), . . . , H (X n )) . The 
domain of this mapping is the probability simplex on X\ x • • • x X n . Let Q be the graph of this 
mapping, i.e., 

Q = {(p(x 1: . . .,x n ),H{Xi), . . .,H(X n )) : for all p{x 1: . ..x n )} . 

Then ConvHull(Cf), the convex envelope of Q can be seen to be equal to 

ConvHull(g) = {(p(x 1 ,...,x n ),H(X 1 \C),...,H(X n \C)) : for all p( Xl , . . . , x n , c)} . 

Thus conditioning over a (classical) random variable is equivalent to convexification. 

Now the question is what happens when we allow C to be a quantum register. In other words 
what we can say about the following set 

QConvHull(S) := {(p(x u . . . , x n ),H(X 1 |F), . . . , H(X n \F)) : 

for all p{xi, . . . , x n ) and p xl ... Xn }- 

Observe that QConvHull(^) is convex and contains ConvHull(CJ). The question is whether this 
containment is strict or equality holds. One difficulty of understanding QConvHull(C/) is that 
unlike the classical case, no bound on the dimension of F is known. This makes QConvHull(CJ) 
to not even be a computable region. 

Example. Let n — 3 and A3 = (Ai,^). Then for any p(xi,X2) the coordinates of the triple 
(H(Ai|F),i/(A 2 |F),iJ(X 3 |F)) satisfy H(X 3 \F) < ff(Ai|F) + iJ(A 2 |F). Suppose we are in- 
terested in the set of triples where equality H(X 3 \F) = F(Xl|F) + H(X 2 \F) holds. In this 
case 7(Ai; A 2 |F) = 0. Now using the structure of states that satisfy strong subadditivity of 
quantum entropy with equality [22] we conclude that there exists a classical random variable C 
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such that I(X 1 ;X 2 \C) = 0, H{X X \C) = H{X 1 \F) and H{X 2 \C) = H(X 2 \F). This means that 
the quantum and classical regions are the same under the constraint that the third coordinate 
of the triple is equal to the sum of the first two coordinates. 

The first main result of this section is that quantum conditioning is strictly richer than clas- 
sical conditioning. Here we introduce new tools that could be useful in bounding the dimension 
of quantum registers as well. 

Theorem 4 (a) The following three statements are equivalent: 

1. QConvHull(C/) = ConvHuil(<?) for any finite sets X\, . . . , X n . 

2. For a classical- quantum channel X — ¥ F determined by a collection of density marices 

forx e X, consider the function p(x) >-> I(X; F) for distributions p(x) on X. Then 
for every e > there exists a classical channel X — > C determined by q(c\x) such that 
\l(X;F)-I(X;C)\<e for allp(x). 

3. For any arbitrary q(x,y,z) consider the optimization problem 

sup I(F;Y)-I(F;Z), 

F-X-YZ 

over all quantum registers F satisfying F—X—YZ. Then the supremum is a maximum 
and is attained at a classical F. 

(b ) There is a counterexample for part (1) implying that all of the above three statements are 
false. 

Part (a2) of the theorem introduces the problem of uniformly approximating the mutual 
information curve (or surface) p(x) i-> I(X;F) with classical ones. The mutual information 
I(X;F) is convex in p{x). Therefore the problem is that of approximating a convex curve (or 
surface) with another one. Statement of this theorem says that this is not possible for some 

Consider the optimization problem introduced in part (a3) of the theorem. Because of the 
Markov chain the auxiliary system F is determined by the collection of states {a x : x e X}. Note 
that the supremum is not computable because no bound on the dimension of F is known. The 
classical form of the expression is sup p ( c | x ) I(C; Y) — I(C; Z) where we are taking the supremum 
over all classical channels p(c\x). In the classical case we know that the supremum is indeed 
a maximum, and further the cardinality of C can be bounded from above by \X\ using the 
strengthened Caratheodory theorem of Fenchel and Bunt (because of the Markov chain the 
cardinality bound would not depend on \y\ and \Z\). Here we have chosen the expression 
I(C; Y) — I(C; Z) since it shows up in many information theoretic problems, especially those 
involving security. The theorem shows that there exists a distribution q(x,y, z) such that 

sup I(F;Y)- I(F;Z) > max I(C;Y) - I(C; Z). (9) 

F-X-YZ p(c\x) 

In the second main result in this section we consider the above statements in the special case 
where \X\ = dimF = 2. 

Theorem 5 1. Let \X\ = dimF = 2 and consider a channel X — >• F determined by p F , pf ■ 
Then one can find a classical channel q(c\x) such that I(X; F) = I(X; C) for all p{x). 

2. For any arbitrary q(x, y, z) where X is binary the optimization problem 

sup I(F;Y)-I(F;Z), 

F-X-YZ 

over all quantum registers F of dimension two is a maximum and one can always find a 
maximizer F that is classical. 

The above two theorems are proved in the following three subsections. 
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4.1 Proof of part (a) of Theorem [4] 

We show that (al) implies (a3), (a2) implies (al), and (a3) implies (a2). The fact that (al) 
implies (a3) is immediate noting that 

7(P; Y) - I(F; Z) = H{Y) - H(Z) - H(Y\F) + H(Z\F) 

can be expressed in terms of conditional entropies given a quantum register. The Markov 
constraint F — X — YZ can also be written as H(YZ\X) = H(YZ\X, F) or alternatively as 
H(YZ\X) = H(YZX\F) — H(X\F) in terms of conditional entropies given the quantum register. 

To show that (a2) implies (al), take some arbitrary finite sets X±,...,X n , q(x\,--- ,x n ) 
and p% ... x . Let X = (Xi, ■ ■ ■ ,X n ). Then by (a2) for any e > 0, one can find a classical 
channel q(c\x) such that \I(X; F) — I(X; C)| < e, for all p(x). We show that this implies that 
\l(Xf, F) - I(Xi; C)\ < e for 1 < i < n. Observe that 

J(JQ; F) = I(X 1 X 2 ■ ■ ■ X n ;F) - 1(X X X 2 ■ ■ ■ X n ; F|X<) 
= I(X-F)-I{X-F\X i ) 

= I(X; F) - ^2p( Xi )I(X; F\X t = x t ). 

But I(X; F\Xi — x^ is nothing but the mutual information between X and F at the conditional 

distribution p x \xM\ x i)- Thus we have I^P^F) ~ I ( x 'i G )\ < e > and \ J ( x ; F \ x i = x i) ~ 
I(X;C\Xi — Xi)\ < e for any xi. This implies 

\l(X f , F) - ipQ; C)\ < e + ^p^^e = 2e, 

Xi 

and 

\HiXt\F) - H(X t \C)\ <2e. 

Therefore we have approximated H(Xi\F) with a classical H(Xi\C) for all i within 2e. This 
implies that 

QConvHull(£) C ConvHull(g), 

where ConvHull(ty) is the closure of ConvHull(^). But ConvHull(CJ) is a closed set since in the 
classical case we can bound the cardinality of the auxiliary C with \X\ \ x \X 2 \ x \X% \ ■ ■ ■ \X n \ +n 
using Caratheodory theorem. 

Showing that (a3) implies (a2) is challenging. Assume that for any distribution q(x, y, z) we 
have 



sup I(F;Y) 

F-X-YZ 



I(F;Z) 



max I(C; Y) 

p(c\x) 



I(C; Z). 



(10) 



Note that the left hand side of (10 1 is not computable because no bound on the dimension of 



F is known. However the right hand side is computable since we can impose the restriction 
|C| < \X\. Let V denote the class of all classical channels p(c\x) where |C| < \X\. 

Fix a distribution q{x) on X and an arbitrary classical-quantum channel X — > F with states 



a x , x € X. Without loss of generality we assume q(x) > for all x € X. By our assumption ( 10 ) 
we have 

I(F;Y)-I(F;Z)< max I(C; Y) — I(C; Z), Vq{y,z\x). 
p(c\x)ev 



Equivalently, 



max 

q(y,A x ) 



I(F;Y)-I(F;Z) 



max I(C;Y)- I(C;Z) 



< 0. 



Take an arbitrary e > 0. Lemma[7]of Appendix [C] shows that one can find a finite set of channels 
p(c m \x), (m — 1, 2, • • • , M e ) to uniformly approximate the continuous function I(C; Y) — I(C; Z) 
over the compact set V within e (an e-net); i.e., for every p(c\x) € V there exists m such that 



| (7(C; Y) - I(C; Z)) - (l(C m ;Y) - I(C m ; Z)) \ < e, Vq(y, z\x). 
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Thus 



max 

q(vM x ) 



7(F;F)-7(F;Z)- max I(C m ; Y) — I(C m ; Z) 

1< m < 



< e. 



This means that 



max 

q{y,z\ x ) 

Alternatively 



7(F;F)-7(F;Z)- max VA m (/(C m ;7) - 7(C m ;Z)) 

A m >0: A m = l 



max mm 

A m >U:£ m A m = l 



J(F; F) - J(F; Z) - £ A m (7(C m ; V) - 7(C m ; Z)) 



< e. 



< e. 



Lemma [H] of Appendix [C] shows that we can exchange the order of maximum and minimum to 
get 

min max 7(F; Y) - 7(F; Z) - V A m (7(C m ; F) - I(C m ; Z)) ; 

Am>0:^ m A m =l q{y,z\x) 

Thus there exists a choice of A m , not depending on q(y,z\x), such that 



max 

q{vM x ) 



7(F; F) - 7(F; Z) - ^ A m (7(C m ; F) - J(C m ; Z)) 

m 

which is equivalent to 

max #(F|Z) - ff(F|F) - V X m (H(C m \Z) - H{C m \Y)) 
Assuming q(y, z\x) — q(y\x)q(z\x) we obtain 



< e, 



< e. 



max 

q(z\x) 



H(F\Z)-J2\ m H(C m \Z) 



max 
q(y\x) 



-H(F\Y) + J2^ n H(C m \Y) 



< e. 



We can express the two maximums in terms of the same channel as follows 



max 

q(z\x) 



H(F\Z)-J2^mH(C m \Z) 



< e + min 

q(z\x) 



H(F\Z)-J2*mH(C m \Z) 



(11) 



Let us define 



W(p(x)) = H(F) - X mH(C m ) =H[J2 

p{x)a x Cm | ^ ) J • 

m \ x / m \ x / 

Then the left hand side of (11 1 is the upper concave envelop^jof the graph of W(p(x)) whereas 
the right hand side is the lower convex envelope of W(p(x)). We know that the difference 
between the two is at most e. Were these two are exactly equal, the function W(p(x)) must have 
been linear in p(x) for all p(x) (and not just the q(x) we started with). Therefore the function 
W{p(x)) is almost linear. 
The function 

V(p(x)) = 7(F; X)-J2 A m 7(C m ; X) = 7(F; X) - I(D; X) 

m 

is equal to W(p(x)) plus a linear term in p(x), where here D is defined as D = (U, Cjj) where 
U is a random variable, independent of X taking value m with probability A m . As a result, the 
upper concave envelope and lower convex envelope of V(p(x)) at q(x) are also e-close to each 
other. 

The function V(p(x)) is zero when p(x) assigns probability one to a single symbol (i.e. on 
the vertices of the probability simplex). Thus its lower convex envelope is less than or equal to 
the zero function, whereas its upper concave envelope is greater than or equal to zero. Since the 
gap between the two is at most e at the given q(x) and q(x) > for all x, \V(p(x))\ should be 
close to zero for every p(x). Thus \I(F;X) — I(D;X)\ < 0(e) for all p(x). This completes the 
proof. 



2 The upper concave envelope of a function f(t) is the smallest concave function g(t) such that /(t) < g(t) for 
all t. Lower convex envelope is defined similarly. 
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4.2 Proof of part (b) of Theorem [4] 

The counterexample is inspired by the examples of (classical) channels for which the (one-shot) 
entanglement-assisted zero-error capacity is greater than the zero-error capacity. Here we explain 
the details based on the Kochen-Specker type channel of |14) . 

Let M = {a,P,j,5,e,(} and X = {9, : 9 € M,l < i < 4}. Moreover, let y = 
{Si, Si, ... , Sis} where Si's are certain four-elements subsets of X: 



Si 


= {at, 0*4,01,(34,}, 


s 2 


= {71,74,^1,(54}, 


s 3 = 


{ei, e 4 , Ci, C4}, 


s 4 


= {a2,a3,/?2,/?3}, 


s 5 


= {72,73,^2,<53}, 


Sq = 


{ e 2, e 3, C2, C3}, 


s 7 


= {"1,0:3,(2,(4}, 


S s -- 


= {/32, At, 71, 73}, 


s 9 = 


{62,84, ei,E3}, 


Sio 


= {02,0:4,(1,(3}, 


S n 


= {^1,^3,72,74}, 


S12 


= {Si, 63, £2, £4}, 


Sl3 


= {ai,a 2 , S 3 ,Si}, 


Sl4 


= {A,/32,e 3 ,e 4 }, 


S15 


= {7l,72,C3,C4}, 


Sl6 


= {03,0-4, £i,<5 2 }, 


Sn 


= {Aj, /3 4 ,ei,e 2 }, 


S18 


= {73,74,Cl,C2}- 



Finally let F be a 4-level quantum system and for <E X define pf, = |'06» i )('0e i | where lipe^'s 
are proportional to vectors 





1 


2 


3 


4 


a 


(1,0,0,0) 


(0,1,0,0) 


(0,0,1,0) 


(0,0,0,1) 




(0,1,1,0) 


(1,0,0,-1) 


(1,0,0,1) 


(0,1,-1,0) 


7 


(1,1,1,1) 


(1,-1,1,-1) 


(1,-1,-1,1) 


(1,1,-1,-1) 


S 


(1,-1,0,0) 


(1,1,0,0) 


(0,0,1,1) 


(0,0,1,-1) 


e 


(-1,1,1,1) 


(1,1,1,-1) 


(1,-1,1,1) 


(1,1,-1,1) 


( 


(1,0,1,0) 


(0,1,0,1) 


(1,0,-1,0) 


(0,1,0,-1) 



Note that l^e*) and \i>g>.) are orthogonal if and only if 8 = 0' or there exists k, 1 < k < 18, such 
that 9i, d'j € Sk- In fact each of the 18 subsets {\ipet) '■ @i € Sk} for all fc, as well as the 6 subsets 
{IV'fli) : i = 1, • ■ • , 4} for all 9 g A4, consist an orthonormal basis for the Hilbert space of F. To 
represent these orthogonality relations form a graph on the vertex set X and connect two vertices 
9i and 9j if (ipg^tpgi) = 0. This orthogonality graph contains 18 + 6 cliques corresponding to 
the above orthonormal bases, and the edge set of the graph is the union of these cliques. The 
independence number of this graph, namely the maximum number of vertices no two of which 
are connected, is 5. 

Now consider the following distribution on XMY. Let p(&i) = be the uniform distribution 
on X. The distribution on M is p(9'\X = 6i) = 1 iff 9' = 8. To define the distribution on Y note 
that for each 0, g X, there are exactly three indexes k such that 0, € Sf.. Let p(Sk\X — 9 t ) = 1/3 
iff 9i £ Sk- Observe that the one-shot zero-error capacity of the channel X — >• Y (determined by 
p(Sk\X = 9i)) is log 5 because the independence number of the orthogonality graph is 5 (see for 
example pT] for more details) . We finally define the state of F to be pg i when X — 9i. 

Now it is easy to verify that 

H(X\F) = log 6, fl"(M|F) = F(M)=log6, H(Y\F) = H(Y) = log 18. 

These equations are all based on the fact that the average of states pg i when 0j ranges over 
a clique of the orthogonality graph, is equal to the maximally mixed state. So by the above 
notation 

(p(x,0,y),log6,log6,logl8) e QConvHull(£), 

where here n = 3 and X x = X, X 2 = M and X 3 = Y. To proof QConvHull(£) ^ ConvHull(^) 
we show that this point does not belong to ConvHull(Cy). Suppose there exists a classical random 
variable C such that 

H{X\C) = log 6, H{M\C) = H{M) = log 6, H{Y\C) = H{Y) = log 18. 

The above three equations imply that (the proof comes later) 

I(C;M) = 0, H(X\CM)=0, MC-X-Y, H(M\CY) = 0. 

Pick a c such that p(C = c) 7^ and consider the distribution p(x,8,y\C = c). By the first 
equation p(9\C = c) = p(9) = 1/6, and by the second equation X is deterministically computed 
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from M (and C = c). Using the structure of the distribution p(x,9) we find that for every 
9 e {a, /3, 7, 5, e, Q there exists i, 1 < i < 4, such that p(x,9\C — c) = 1 iff x = 6i. We 
denote the set of these six 9i by T. Thus |T| = 6 and for every 9 E M there exists i such that 
0j 6 T. MC -X-Y implies that p{y\C = c,M = 6,X = 0*) = p(y\X = 9 t ). So y is uniformly 
distributed among the three subsets Sk that contain 9i. Finally the last equation says that y 
(and C — c) uniquely determines 9. This means that, there is no Sk that contains more that 
two elements of T. As a result, T is an independent set of the orthogonality graph of size 6. 
This is a contradiction since the independence number of this graph is 5. 

A more intuitive argument is based on a zero-error communication protocol over the channel 
X — > Y using C as shared randomness. We take M as the message to be transmitted from 
the sender to the receiver. Note that by the first equation M is independent of C (the shared 
randomness), so this analogy makes sense. H(X\CM) — implies that X is a function of CM. 
So the sender computes X from M and C, and sends it over the channel. By MC — X — Y given 
the input, the output of the channels is independent of M and C. Finally the last equation means 
that the receiver can decode M from Y and C, and this can be done with no error. As a result 
the one-shot zero-error capacity of X — > Y is at least H(M) = log 6 which is a contradiction. 

We finish this section by proving that the three equations 

H{X\C) = log 6, H{M\C) = H{M) = log 6, H(Y\C) = H{Y) = log 18, 

imply 

I(C;M) = 0, H(X\CM)=0, MC-X-Y, H(M\CY) = 0. 

The first equation directly follows from H(M\C) = H(M) = log 6. To show the third and last 
equations we write 

H{X\CY) = H(XY\C) - H(Y\C) 

= H{X\C) + H{Y\CX) - H(Y) 
= H(X\C) + H(Y\MCX) - H(Y) 
< H(X\C) + H(Y\X) - H(Y) 
= log 6 + log 3 -log 18 = 0. 

where in the third line we have used the fact that M is uniquely determined in terms of X. The 
above equation implies that H(M\CY) = 0. Further the inequality should hold with equality. 
This gives us the constraint MC — X — Y. To show the second identity we write 

H(X\CM) = H(XM\C) - H(M\C) = H{X\C) - H{M\C) = log 6 - log 6 = 

where we have again used the fact that M is a function of X . 



4.3 Proof of Theorem [5] 

By part (a) of Theorem [4] it suffices to prove the first part. Since \X\ = 2 the distribution of X 
is determined by p(X = 0) = p and p(X = 1) = p = 1 — p. I(X; F) for p = and p — 1 is equal 
to zero and equal to I(X; C) for every X — > C. Therefore we only need to show that there exists 
a classical channel X — > C such that 

d 2 d 2 

at every p. On the other hand since I(X; F) and H(F) differ only at a linear function in terms 
of p, it is sufficient to show that there exists a classical channel X — > C with 

d 2 d 2 

9p2 - W H{C) - 

Let s = (si,S2,S3) and r — (ri, r2, fa) be the Bloch sphere representations of po and p\ 
respectively, i.e., 

1 T 1 < 
1 T 1 / 

Pi = -i + £ {rio- x + r 2 <7 y + r 3 a z ) , 



16 



where a Xl a y , a z are Pauli matrices. If = ~~i> then po = p\ and the existence of C is immediate. 
Thus we assume r ^ s . The margin of F is equal to 

P = P P = PPo +PPl = \l + \ ((Pri +psi)cr x + {pr2+ps 2 )<J y + (pr 3 + ps 3 )a z ) , 



so the eigenvalues of p are 



A — X p — 



l + \\p^+p~t\ 



and 1 — A. Therefore, H(¥) = h(X) where h(-) denotes the binary entropy function. The second 
derivative of the binary entropy function is computed as 



dp 



, h(X) = -A" (In A - ln(l - A)) - A' 2 



1 



A(i-A); 5 



(12) 



where A' 



X and A" 



logarithm in base 2. 
Let us define 



djr 



-A, and for simplicity for take the natural logarithm instead of 



z = z p = \\pi> + P -f\\ 2 = \\-f- ^|| V + 2(1*' 



7)p- 



— ^||2 



and A = ||1^|| 2 • ||~^ - r^|| 2 - - ~^) 2 . By Cauchy-Schwarz inequality A > and A < 

||^-^|| 2 since ||s|U|r|| < 1. Then A = ^p. an d 



A' = 



Z> 



A" = 



2Z"Z - Z r 
8Z 3 / 2 



where Z' = §^Z and Z" = 9 



2VZ 

Op-zZ. Putting in (12) we obtain 



2Z"Z - Z'' 



Using the Taylor expansions ln(l + Z) = YlkLi — Z k and = YlkLo ^ k we find that 

Z n " 



d 2 
dp 2 



2 H(F) 



Z"Z z nx x 

~~2 4~ 



y^—z k 

^ 2fc + 3 



}Z ZK 



fe=0 



Finally using the definition of Z we conclude that 



dp- 



H(F) 



l^-l^ll 2 



A 



OO _^ 

51 2fc^ 



fc=0 



2fc + 3 



fc=0 



A 



A classical channel A" — >• D for |2?| = 2 is determined by 

p(D = 0\X = 0) = a, p{D = Q\X = l) = b. 
We denote such a D by D a ^. Then H(D a ^) = h(ap + bp) and 







{a-bf 



Let us assume that 



-{a - 6)V + (a - &)(1 - 26 )P + 6 (! - fe ) 

||^-^|| 2 



(1^ - -f)) b - 



(13) 



Then 



dp 



2 H(D a . b ) 



l^-l^ll 2 



b(l-b)||^--^||2 + 



(a-fc) 2 



- z 
1 



= -ii^-^ii 2 E 

^( fc(1 ty |2 + i^i 



fc+i • 
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We now claim that for every < 9 < 1 there 



exist < ae,bg < 1 that satisfy (13) and 
6 fl (l-6 fl )||^-^|| s ' ^ 2 = 1 



[a e - b g ) 2 

By continuity we only need to prove the claim for 9 — and 9 = 1. For 9 = take ciq = bo = 1/2, 
and for 9 = 1 take 



1 



' | ||^-^i| 2 + QV-^) 

1 2 V VII^-^ll 2 -(i-ll^ll 2 ) + (^R-^) 

2 1 ^\\ir-r\\i-(i- 



Using ||~^|j, ||^|| < 1, it is easy to see that < a\, b± < 1. We thus have 

Now define a channel X — > C which with probability 1 — * s ec l ua l to ^ ~> D ai j >1 , 

and with probability ^^-^p is equal to the channel X — > D ag2 ^ g2 where < 9 < 1 is chosen 



r J || s — r || z 1 

uniformly at random. Observe that 
c/ 2 



<9 2 A f 1 d 2 ( A \ d 2 

W 2H{C) = ~ll^-^ll 2 io s^ (Al ^ 2)d *~ ^ ~ W 2H{Daubl) 

^ W J o 1 ±0 2 ^9 + {l-^ 2 )±Z^ 

1 1/0 fc=0 \ M H / fc=Q 



= -11^-^1 



fc=0 



= -11^-^1 



OO 



i|-^_yi|2 H 

11 11 fc=0 



1 

2fc + 3' 



1 - 



fc=0 



dp 



4.4 Dimension Bounds 

Bounding the dimension of an auxiliary quantum register in an optimization problem seem a 
more difficult problem comparing to its classical case. In Section 3.5 for example we see that 
even for the simple CHSH-type correlations we do not have a bound on the dimension of F 
satisfying Q. Here we briefly discuss that such a bound may not exists in general. 

Consider the 13322 Bell inequality. Based on evidences from numerical simulations in [2"3"] 
it is conjectured that the maximum violation of this inequality in the quantum world does not 
happen in finite dimensions. Here assuming this conjecture we show that in general there is no 
bound on the dimension of the auxiliary F of Q . 

Let p(a, b\x,y) be the non-local quantum correlation that maximally violates the 13322 in- 
equality, so two parties given x,y can locally measure a shared entangled state to output a, b 
with distribution p(a, b\x, y). Note that this is a one-shot zero-error simulation of the distribu- 
tion, and by the conjecture cannot be realized with a finite dimensional shared entangled state 
if there is no communication. Now the question is whether the restriction on the dimension of 
shared state still holds if we consider the information theoretic setting where we allow for an 
asymptotically vanishing error. To be more precise let us fix a distribution p(x, y) = p{x)p(y) 
on X.Y. If such a dimension bound exists then So = Ploo^ = ® n tne other hand by 
the definition of p(a, b, x, y), € lZ q . This means that there exists an auxiliary F satisfying (HI) 
such that I(X;F\Y) = 0. Now looking at the details of the achievability part of TheoremT2] 
error enters only in the remote state preparation part (see the proof of Theorem [3]) , but when 
I(X; F\Y) = we basically do not need this part, and no error enters the simulation. In other 
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words when I(X;F\Y) = the single-shot zero-error and the information theoretic settings are 
equivalent; if there is no dimension bound on the former, there is no bound on the latter as well. 

This observation suggests that proving dimension bounds on quantum registers in general 
may not be possible. Nevertheless for simpler problems like the one considered in the previous 
subsections we may use the ideas developed there. Let us again consider the optimization 
problem 

sup I(F;Y)-I(F;Z), 

F-X-YZ 

and assume that we can restrict to quantum registers with dimension bound d = d(\X\, \y\, \y\). 
This means that an F with dimension larger than d can be replaced with a smaller one. 
In fact, similar arguments as above indicate that for every e > 0, there are quantum regis- 
ters Fi, F2, Fj,j, , with dimension bounds less than or equal to d, and non-negative weights 
Ai, . . . , \m c such that ^ A m = 1 and that the function 

p(x) ^ H(F) -J2^mH(F m ), 

m 

is e-linear (meaning that its upper concave and lower convex envelops differ by at most e in the 



sense of ( 111) 



To gain some intuition let us assume that the function is perfectly linear in p(x). Thus there 
are coefficients \i x such thalj^j 

H(F) = KnH(F m ) + Mix), Vp(x). 

m x 

Note that the coefficients A m and fx x as well as registers F m depend on F but the above relation 
has to hold for all p(x). This equation suggests that to study dimension bounds, it would help 
to understand the behavior of the convex function p(x) <— > H(F) in terms of the dimension of 
F. Indeed the question of finding dimensions is reduced to the question of whether or not these 
functions become more and more complicated (in structure) as we increase the dimension, so 
that they cannot be written in terms of those functions with smaller dimensions. 



4.5 Multi-letter Convex Hull 

The problem of convexification (conditioning on an auxiliary register) can be generalized to the 
mufti- letter case as follows. For every natural number m define m-letter-ConvHull(CJ) to be 

\(p(x u . . .,x n ), -H{X™\C\ . . . , -H{X™\C)) : 
L to to 

TCI 

for all to and JJp(:eh, . . . , x ni )p{c\x™ , x™, . . . ,x™)\. 
i=i 

Multi-letter convex hull is defined as the closure of the union of TO-letter-ConvHull(C/) over all 
natural numbers to: 

Multi-letter-GonvHull(a) = {J m-letter-ConvHull(£). 

m 

Multi-letter convex hull has no operational meaning. Nonetheless we can view it as a particu- 
lar relaxation of ConvHull(^) with relevance in information theory. To underscore its importance, 
note that the capacity region of the 3-receiver broadcast channels with 2-degraded message sets 
(an open problem) can be expressed in terms of the multi-letter convex hull region]^] No single- 
letter expression for the multi- letter convex hull region exists for n > 2. For the case of n = 3 
when X$ — (Xi,X 2 ) the region can be deduced from Theorem 1 of [25 . As a corollary to this 

3 We avoid adding a constant term because ^2 x p(z) = 1. 

4 To see this observe that the capacity region is equal to the multi-letter version of the inner bound (Bound 1) 
in |24| . This inner bound contains only one auxiliary random variable U. The constraint U — X — Y1Y2Y3 can 
be expressed as H (YiY2Y-jX\U) — H(X\U) = II (Y\ Y2V3 X) in terms of entropies conditioned on U. 
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Theorem, we can explicitly find the multi-letter convex hull region for n — 2 by considering only 
the first two coordinates 

-H(X{ n \C),-H(X™\C)). (14) 
mm/ 



That is, for a given p(xi,x 2 ), we may consider the closure of the set of points (14) over all m 
and p(c|x™, x™). This two dimensional region has a very simple description. It is the convex 
hull of four points: 

(0,0), (H(X 1 ),H(X 2 )), (I(X V ,X 2 ),H(X 2 )), (H(X 1 ),I(X 2 ;X 1 )). 

Thus, while ConvHull(C?) has a curvy boundary for n = 2, the multi-letter region has straight 
boundary lines. 

Both the Multi-lettcr-ConvHull(Cf ) and the QConvHull(Cf) are outer bounds for ConvHull(Cf). 
The next lemma addresses a relation between these two sets. 

Lemma 6 There are examples where: 

Multi-letter-ConvHull(a) <£ QConvHull(£). 

Proof: Consider the case of n = 2 and binary X\ and X 2 . Take some p{x\ , x 2 ) where X\ and X 2 
are not independent and X\ ^ X 2 . Take the point (I{X\\ X 2 ), H(X 2 )) in Multi-letter-ConvHull(ty). 
We need to show that there is no quantum F such that H{X\\F) = I(Xi;X 2 ) and H(X 2 \F) = 
H(X 2 ). We have 

H{X 1 \X 2 ) = J(X i; F) < I(XiX 2 ;F) = I{X 2 ; F) + I{X X ; ¥\X 2 ) = I(X V , F|X 2 ) < H(X X \X 2 ). 

For equality to hold we must have I(X 2 ; F\Xi) = 0, so the state of F is independent of X 2 , i.e., 
Px 1: x 2 = Pxi' anc ^ ^ s determined by p = Px^o and pi = px t =i- Moreover, H(X 2 \F) — H(X 2 ) 
implies that 

P (X 1 = 0\X 2 - 0)p Q +p{X 1 = 1\X 2 = 0) Pl = p(X 1 = 0\X 2 = l)p +p(Xi = i\x 2 = l) Pl . 
This implies that po — pi , and hence F is independent of X\ which is a contradiction. 

□ 

It is interesting to investigate the possibility of 

QConvHull(£) £ Multi-letter-ConvHull(£). 

This could imply that the entanglement-assisted version of a classical problem is strictly larger 
than the randomness-assisted version, for a problem for which no single-letter capacity expression 
is available. 



5 Conclusion 

We studied the role of entanglement in classical communication problems from an information 
theoretic point of view. We observed that unlike the problem of point-to-point channel capacity, 
entanglement does help in the problem of simulation of correlations. We in fact found the 
entanglement-assisted one-way communication cost of simulation of bipartite correlations. The 
rate region of this problem is determined by similar expressions as the randomness-assisted case 
where an auxiliary random variable is replaced by a quantum register. We then considered the 
Gray-Wyner problem and observed that the entangled-assisted rate region is related to the usual 
rate region where again a classical auxiliary register is replaced by a quantum one. Motivated 
by the structure of these rate regions we formalized the problem of quantum convexification. We 
showed that quantum convexification coincides with (classical) convexification in a very special 
case but in general goes beyond it. We argued that the techniques developed in this part may 
give intuition on the problem of bounding the dimension of auxiliary quantum registers. We then 
considered the convexification problem in the multi-letter case. The example which is based on 
the raise of the zero-error capacity by entanglement, cannot be generalized to the multi- letter 
case because entanglement does not increase the usual capacity. So an interesting question 
here is whether the multi-letter quantum convexification can go beyond the multi-letter classical 
convexification. 
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Appendix 
A Proof of Lemmas |4] and [5] 

Proof of Lemma [3J Observe that 

(of) (of)' = (of Y = 2"W F i x >- 5 ") (nf Px ^) T < (nf ) T < i. 



Thus {Qg , y/l — (Qf) 2 } defines a measurement. Moreover, 



tr* 



(Of ®V) |V>XVW (of®. 



' • i (of ) ) / 

2 n(H(F|X)-5") 



dimm 



Now the results follows from two applications of GML. Define 



TP /i „TT™ 



TT™ /)' 17™ 



^ tr(nf^.n»)' ^ tr(n« P ;„n«)- 

We have ||p x „ - Uf p B »Iiy||i < 2^i. Thus \\p x n - p'^ < and 

tr (p^nf ) > tr (p^Ilf ) - > 1 - e - 2^. 
Therefore, \\p' xn — PxA\i — 2\A + 2y/e, and by the triangle inequality 

IIP.- -9% ||i < 2^+^ + 2^) <6 v / i. 



We can work out equation (15 1 as follows: 



2 n(H(F\x)-8") . . 2 ™( ff ( F l x )- <5 ") / „ \ 



(15) 



1 5 X / UlllliJ^ 

This means that after measurement subsystem F' collapses to p"„ with probability 

dimnw tr (Ufp xn ) tr (H"pU) > 2-™*>- 5 - ff ( F )-^)(l - e )(l - (e + 2^)) 

= 2 -"( / ( F;X )+ 5 "+ c5 )(l - e )(l - (e+ 2Vi)) 

> 2 -n(/(F;X)+5"+c5)/ 1 _ 



Proof of Lemma OD First note that 



tr 



< 



dimnf 2"W F l y )+ 5 " 

< 



dimn n 



dim TI" 



< _J_2-n{I{F;Y)-S" -cS) 
~ 1 - e 



Let m = 2"( / ( F ; Y )- ,5 "- c ' 5 - Q ) and p = -L^-"^^- 5 "-^. Thus probability of obtaining one 
or zero II? is at most the probability of obtaining no Id? which is greater than or equal to 



(1 — p) m = e mln ( 1 ~P) > e -™P/( 1 -P) > g- 2m P = g- 2 " a + 1 /(l-e) > I _ 



1-e 
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Figure 2: The Gray-Wyner game consists of n + 1 players, Alice and n Bobs who are indexed 
by i = 1, . . . ,n. Alice receives the i.i.d. copies of X\, . . . , X n , sends public information at rate 
Rq to all Bobs and private information at rate Ri to Bob.; . The goal of Bobi is to recover Xi . 



B Entanglement-assisted Gray-Wyner region 

In the Gray-Wyner problem Alice is observing i.i.d. copies of X\, . . , , X n , and can send a public 
message at rate Rq to all Bobs, and n private messages at rates Ri, R2, ■ . ■ , R n (at rate Ri to 
Bobi). The goal is for Bobi to recover the i.i.d. copies of Xi with probability of error converging 
to zero as the number of i.i.d. observations goes to infinity (see Fig. |2j. The Gray-Wyner region 
is defined to be the set of achievable rate vectors (Ro, Ri, ■ ■ • , Rn)t i- e - (Ra, Ri, ■ ■ ■ 1 Rn) is 
achievable if by sending public and private information at rates Rq and Ri , . . . , R n respectively, 
Bobs' demands can be fulfilled. 

Recently Winter (private communication) has shown that the capacity region of the entanglement- 
assisted Gray-Wyner problem, C q GW is characterized by tuples 

(I{X 1 ...A„:F, ... F„), H(X 1 |Fi), . . . , H(X n \F n )) , (16) 

where Fi, . . . , F„ are n arbitrary auxiliary quantum registers]^] The classical Gray-Wyner region, 
Cq W , is characterized by tuples 

(I{X 1 ...X n - d),H{X x \C), . . . , H(X n \C)), (17) 

where C is an arbitrary auxiliary random variable [16] . To make the two regions closer in 
expression we consider a third region Cq W characterized by tuples 

(/(Xi . . . X n ; F),H(X 1 |F), . . . , H(X n \F)) , (18) 

where F is an arbitrary auxiliary quantum register. We have Cq W C Cq W C Cq W since 
H(X i \F 1 ) > H(X i \F 1 . . . F„) we can identify F by Fi . . . F„. 

If we assume that the auxiliary register in Cq W is a classical random variable, we obtain the 
classical capacity region of the Gray-Wyner problem. Thus if the auxiliary quantum register 
cannot be replaced by classical random variables when optimizing the expression, we will be able 
to conclude that Cq W ^ Cq W , meaning that shared entanglement helps. 



C Technical Details of Section |4] 

These three lemmas complete the argument of Section [4] and follow the same notations. 

Lemma 7 There exists a finite set of points p(c m \x), (m — 1, 2, • • • , M e ) such that for every 
p(c\x) € V there exists m such that 

I (I(C; Y) - J(C; Z)) - (l(C m ; Y) - I(C m ; Z))\<e Vq(y, z\x). 

5 The converse follows from similar steps as in the classical Gray-Wyner problem. The achievability follows 
from a remote state preparation protocol (such as that of Theorem [3] but with no side information) together with 
the quantum-classical Slepian-Wolf theorem. 
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Proof: Define T (p(x) , p(c\x)) — H(C) for all p(x) and p(c\x) £ V. T is a continuous function 
denned on a compact set. Thus T is uniformly continuous, i.e., for every e > there exists 8 > 
such that if ||(p(x) s p(c|i))-(p(!B , ),p(cr|x))||i < £, then|T(p(x),p(c|a;))-TO(x / ),p(c»)| < e/2. 

On the other hand, the set of points p(c\x) is compact. Thus, there exists a (S-net, i.e., 
there exists a hnite set of point p(c m \x), m = 1, . . . , M e , such that for every p{c\x) there exists 
m with ||p(c|x) — p(c m |x)||i < 8. This implies that for every p(x), 

\T(p(x),p(c\x)) - T(p(x),p(c m \x))\ < e/2. 

Now note that 

I(C; Y) - I(C; Z) = H(C\Z) - H{c\Y) = Y,p(z)T(p(x\z),p(c\x)) - ^p{y)T(p{x\y),p(c\x)), 

z y 

is the difference of two convex combinations of evaluations of T. We are done. □ 



Lemma 8 With the notation developed above, we have 



mm max 

A m >0:£ m A,„ = l q(y,z\x) 



max mm 

q(y,z\x) A m >0:£ m A m = l 



I(F; Y) - I(F; Z) - ^ X rn (l(C m ;Y) - I(C m ; Z)) 

rn 

I(F: Y) - I(F; Z) - ^ A m (l(C m ;Y) - I(C m ; Z)) 



Proof: To show the legitimacy of exchanging maximum and minimum we use Corollary 2 of 
[26] with the choice of T m (q(y, z\x)) = J(P; Y) - J(F; Z) - J(C m ; F) + I(C m ; Z), and d = M e . 
To apply Corollary 2 we need to show the convexity of the set 

A = {(ai,...,a M J ■ a m < I(F;Y) - J(F; Z) - I(C m ;Y) + I(C m ; Z) for some q(y,z\x)}. 

Take two arbitrary points in A. We show their average is in A. Corresponding to these two 
points are two channels q(y±, z%\x) and q(y2, Zi\x). We construct q(yo, zq\x) as follows. Let U be 
the uniform binary random variable {1,2} independent of all previously defined registers. Let 
Yq = (U,Yjj) and Zq = (U, Zjj). Then it is easy to verify that 

T m (q(y ,z \x)) = ^(T m (q{y u Zi\x)) + T m (q(y 1 , Zi\x))). 



This implies that the average of the two points is in A. 
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